Goto

Collaborating Authors

 gaussian mixture distribution




Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors

Sefidgaran, Milad, Zaidi, Abdellatif, Krasnowski, Piotr

arXiv.org Machine Learning

We establish in-expectation and tail bounds on the generalization error of representation learning type algorithms. The bounds are in terms of the relative entropy between the distribution of the representations extracted from the training and "test'' datasets and a data-dependent symmetric prior, i.e., the Minimum Description Length (MDL) of the latent variables for the training and test datasets. Our bounds are shown to reflect the "structure" and "simplicity'' of the encoder and significantly improve upon the few existing ones for the studied model. We then use our in-expectation bound to devise a suitable data-dependent regularizer; and we investigate thoroughly the important question of the selection of the prior. We propose a systematic approach to simultaneously learning a data-dependent Gaussian mixture prior and using it as a regularizer. Interestingly, we show that a weighted attention mechanism emerges naturally in this procedure. Our experiments show that our approach outperforms the now popular Variational Information Bottleneck (VIB) method as well as the recent Category-Dependent VIB (CDVIB).


Assessing Uncertainty in Stock Returns: A Gaussian Mixture Distribution-Based Method

Wang, Yanlong, Xu, Jian, Huang, Shao-Lun, Sun, Danny Dongning, Zhang, Xiao-Ping

arXiv.org Artificial Intelligence

This study seeks to advance the understanding and prediction of stock market return uncertainty through the application of advanced deep learning techniques. We introduce a novel deep learning model that utilizes a Gaussian mixture distribution to capture the complex, time-varying nature of asset return distributions in the Chinese stock market. By incorporating the Gaussian mixture distribution, our approach effectively characterizes short-term fluctuations and non-traditional features of stock returns, such as skewness and heavy tails, that are often overlooked by traditional models. Compared to GARCH models and their variants, our method demonstrates superior performance in volatility estimation, particularly during periods of heightened market volatility. It provides more accurate volatility forecasts and offers unique risk insights for different assets, thereby deepening the understanding of return uncertainty. Additionally, we propose a novel use of Code embedding which utilizes a bag-of-words approach to train hidden representations of stock codes and transforms the uncertainty attributes of stocks into high-dimensional vectors. These vectors are subsequently reduced to two dimensions, allowing the observation of similarity among different stocks. This visualization facilitates the identification of asset clusters with similar risk profiles, offering valuable insights for portfolio management and risk mitigation. Since we predict the uncertainty of returns by estimating their latent distribution, it is challenging to evaluate the return distribution when the true distribution is unobservable. However, we can measure it through the CRPS to assess how well the predicted distribution matches the true returns, and through MSE and QLIKE metrics to evaluate the error between the volatility level of the predicted distribution and proxy measures of true volatility.

  gaussian mixture distribution, mixture distribution, volatility, (15 more...)
2503.06929
  Country:
  Genre: Research Report > New Finding (0.46)
  Industry: Banking & Finance > Trading (1.00)

Equivariant Masked Position Prediction for Efficient Molecular Representation

An, Junyi, Qu, Chao, Shi, Yun-Fei, Liu, XinHao, Tang, Qianwei, Cao, Fenglei, Qi, Yuan

arXiv.org Artificial Intelligence

Graph neural networks (GNNs) have shown considerable promise in computational chemistry. However, the limited availability of molecular data raises concerns regarding GNNs' ability to effectively capture the fundamental principles of physics and chemistry, which constrains their generalization capabilities. To address this challenge, we introduce a novel self-supervised approach termed Equivariant Masked Position Prediction (EMPP), grounded in intramolecular potential and force theory. Unlike conventional attribute masking techniques, EMPP formulates a nuanced position prediction task that is more well-defined and enhances the learning of quantum mechanical features. EMPP also bypasses the approximation of the Gaussian mixture distribution commonly used in denoising methods, allowing for more accurate acquisition of physical properties. Experimental results indicate that EMPP significantly enhances performance of advanced molecular architectures, surpassing state-of-the-art self-supervised approaches. Graph neural networks (GNNs) have found widespread application in computational chemistry. However, unlike other fields such as natural language processing (NLP), the limited availability of molecular data hampers the development of GNNs in this domain. For example, one of the largest molecular dataset, OC20 (Chanussot et al., 2021), contains only 1.38 million samples, and collecting more molecular data with ab initio calculations is both challenging and expensive. To address this limitation, molecular self-supervised learning has gained increasing attention. This approach enables molecular GNNs to learn more general physical and chemical knowledge, enhancing performance in various computational chemistry tasks, such as drug discovery (Hasselgren & Oprea, 2024) and catalyst design (Chanussot et al., 2021). Current self-supervised methods for molecular learning contain two mainstream categories: masking and denoising. Masking methods (Hu et al., 2020; Hou et al., 2022; Inae et al., 2023) adapt the concept of masked token prediction from natural language processing (NLP) to graph learning, where graph information, such as node attribute, is masked instead of token. However, there are two major limitations: underdetermined reconstruction and lack of deep quantum mechanical (QM) insight.


Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection

Adams, Steven, Patanè, null, Lahijanian, Morteza, Laurenti, Luca

arXiv.org Machine Learning

Infinitely wide or deep neural networks (NNs) with independent and identically distributed (i.i.d.) parameters have been shown to be equivalent to Gaussian processes. Because of the favorable properties of Gaussian processes, this equivalence is commonly employed to analyze neural networks and has led to various breakthroughs over the years. However, neural networks and Gaussian processes are equivalent only in the limit; in the finite case there are currently no methods available to approximate a trained neural network with a Gaussian model with bounds on the approximation error. In this work, we present an algorithmic framework to approximate a neural network of finite width and depth, and with not necessarily i.i.d. parameters, with a mixture of Gaussian processes with error bounds on the approximation error. In particular, we consider the Wasserstein distance to quantify the closeness between probabilistic models and, by relying on tools from optimal transport and Gaussian processes, we iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes. Crucially, for any NN and $\epsilon >0$ our approach is able to return a mixture of Gaussian processes that is $\epsilon$-close to the NN at a finite set of input points. Furthermore, we rely on the differentiability of the resulting error bound to show how our approach can be employed to tune the parameters of a NN to mimic the functional behavior of a given Gaussian process, e.g., for prior selection in the context of Bayesian inference. We empirically investigate the effectiveness of our results on both regression and classification problems with various neural network architectures. Our experiments highlight how our results can represent an important step towards understanding neural network predictions and formally quantifying their uncertainty.


Generative neural networks for characteristic functions

Brück, Florian

arXiv.org Machine Learning

The characteristic function is one of the fundamental objects in probability theory, since it uniquely characterizes the distribution of a real-valued random vector in a concise way. Its properties often allow to simplify theoretical derivations, especially when sums of independent random variables are investigated. Further, it also allows to easily derive certain properties of the underlying random vector, such as its moments. A disadvantage of working with characteristic functions is that simulation from the corresponding random vector is not straightforward when there is no further information about the underlying random vector. As Devroye comments on the simulation from a (univariate) characteristic function in [12]: "If the characteristic function is known in black-box format, very little can be done in a universal manner". This poses major challenges in applications, since simulation from the corresponding random vector is often essential to asses certain quantities of interest. Several approaches to simulate from a random vector that corresponds to a given characteristic function seem to naturally come to mind. There are various ways of "inverting" the characteristic function to obtain its corresponding (Lebesgue) density or distribution function, such as the Fourier inversion formula, Lévy's characterization theorem and several other variants thereof.


Counterfactual Explanation via Search in Gaussian Mixture Distributed Latent Space

Zhao, Xuan, Broelemann, Klaus, Kasneci, Gjergji

arXiv.org Artificial Intelligence

Counterfactual Explanations (CEs) are an important tool in Algorithmic Recourse for addressing two questions: 1. What are the crucial factors that led to an automated prediction/decision? 2. How can these factors be changed to achieve a more favorable outcome from a user's perspective? Thus, guiding the user's interaction with AI systems by proposing easy-to-understand explanations and easy-to-attain feasible changes is essential for the trustworthy adoption and long-term acceptance of AI systems. In the literature, various methods have been proposed to generate CEs, and different quality measures have been suggested to evaluate these methods. However, the generation of CEs is usually computationally expensive, and the resulting suggestions are unrealistic and thus non-actionable. In this paper, we introduce a new method to generate CEs for a pre-trained binary classifier by first shaping the latent space of an autoencoder to be a mixture of Gaussian distributions. CEs are then generated in latent space by linear interpolation between the query sample and the centroid of the target class. We show that our method maintains the characteristics of the input sample during the counterfactual search. In various experiments, we show that the proposed method is competitive based on different quality measures on image and tabular datasets -- efficiently returns results that are closer to the original data manifold compared to three state-of-the-art methods, which are essential for realistic high-dimensional machine learning applications.


RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder for Language Modeling

Deng, Jingcheng, Pang, Liang, Shen, Huawei, Cheng, Xueqi

arXiv.org Artificial Intelligence

Retrieval-augmented language models show promise in addressing issues like outdated information and hallucinations in language models (LMs). However, current research faces two main problems: 1) determining what information to retrieve, and 2) effectively combining retrieved information during generation. We argue that valuable retrieved information should not only be related to the current source text but also consider the future target text, given the nature of LMs that model future tokens. Moreover, we propose that aggregation using latent variables derived from a compact latent space is more efficient than utilizing explicit raw text, which is limited by context length and susceptible to noise. Therefore, we introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE). It encodes the text corpus into a latent space, capturing current and future information from both source and target text. Additionally, we leverage the VAE to initialize the latent space and adopt the probabilistic form of the retrieval generation paradigm by expanding the Gaussian prior distribution into a Gaussian mixture distribution. Theoretical analysis provides an optimizable upper bound for RegaVAE. Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.


Gradual Domain Adaptation via Normalizing Flows

Sagawa, Shogo, Hino, Hideitsu

arXiv.org Machine Learning

In a standard problem of learning predictive models, it is assumed that the probability distributions of the test data and the training data are the same. The prediction performance generally deteriorates when this assumption does not hold. The simplest solution is to discard the training data and collect new samples from the distribution of the test data. However, this solution is inefficient and sometimes impossible, and there is a strong demand for utilizing the valuable labeled data in the source domain. Domain adaptation (Ben-David et al., 2007) is one of the transfer learning frameworks in which the probability distributions of the prediction target data and the training data are different. In domain adaptation, the source domain is a distribution with many labeled samples, and the target domain is a distribution with a few or no labeled samples. The case with no labels from the target domain is called unsupervised domain adaptation and has been the subject of much research, including theoretical analysis and real-world application (Ben-David et al., 2007; Cortes et al., 2010; Mansour et al., 2009; Redko et al., 2019; Zhao et al., 2019). In domain adaptation, the predictive performance on the target data deteriorates when the discrepancy between the source and target domains is large.